-
Notifications
You must be signed in to change notification settings - Fork 15.2k
[MLIR][SideEffects][MemoryEffects] Modified LICM to be more aggressive when checking movability of ops with MemWrite effects #155344
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
…t and modified LICM pass. Allows speculatable ops with 'Init' Memory Effects to be moved out of loops if op does not have other, non-Init, Memory Effects and no other operations within it's nested region(s) have Memory Effects that apply to the same resources as the original op.
Thank you for submitting a Pull Request (PR) to the LLVM Project! This PR will be automatically labeled and the relevant teams will be notified. If you wish to, you can add reviewers by using the "Reviewers" section on this page. If this is not working for you, it is probably because you do not have write permissions for the repository. In which case you can instead tag reviewers by name in a comment by using If you have received no comments on your PR for a week, you can request a review by "ping"ing the PR by adding a comment “Ping”. The common courtesy "ping" rate is once a week. Please remember that you are asking for valuable time from other developers. If you have further questions, they may be answered by the LLVM GitHub User Guide. You can also ask questions in a comment on this PR, on the LLVM Discord or on the forums. |
I'm wary of something like:
I suspect you'll move the store here, changing the value that the load is seeing. |
fair point. yes that will get moved out. The current LICM pass doesn't account for dead loops either and will move out the op if it has no mem effects AND is speculatable --> but since there's no effects it'll probably get removed by DCE whereas now the memWrites will force it to remain. I'm looking through the LoopLikeOpInterface to see if we can infer dead loops. Might end up being restricted to only loops with known/constant iterator ranges. Let me know if you have any suggestions for where else to look for a solution. |
…erbeikTT/llvm-project into mbagherbeik/mlir/LICM_improvements merging changes
Added more conditions. Ops with MemWrite memory effects will now be moved out of loops if:
|
Can you elaborate on why you chose to target writes instead of loads? |
if you mean "why writes are movable but not reads (loads)": this is a partial implementation to make sure we're on the same page in terms of implementation details. If yes, then adding similar functionality for moving reads and even taking into account stages so an op can do neat things like make itself immovable by adding MemRead<rA, 0> and MemWrite<rA, 1> while another op can still be moved if the stages are reversed. if you did mean loads, please elaborate as I'm not sure what you mean, specifically. |
Co-authored-by: Mehdi Amini <[email protected]>
Alrighty. It looks like the code changes so far are relatively up to standards. Mehdi provided a lot of useful feedback that I'll be addressing. I'll spend time over this weekend to make the major changes like adding read and stage support as well as making the walk more efficient. If the maintainers (@joker-eph) prefer a more incremental update approach let me know. As always, thanks again for spending time to look into this with me and the rest of the community. |
Significantly changed LICM pass
Looking forward to the feedback as this is a pretty wide departure from the previous method |
} | ||
func.return %final : vector<4x4xf32> | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Many of the test cases in https://github.com/llvm/llvm-project/blob/main/mlir/test/Dialect/Affine/affine-loop-invariant-code-motion.mlir may be useful for these changes. We can delete the affine-licm pass if it's subsumed by LICM post this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into the affine-LICM and currently the passes aren't quite the same, as the affine version doesn't check for speculability and most of the affine ops don't have the speculation interface which LICM requires. The ops require some modification to use this pass; it may be as simple as adding "AlwaysSpeculatable" to relevant ops but I can't speak to that with confidence.
You can test this locally with the following command:git-clang-format --diff origin/main HEAD --extensions cpp,h -- mlir/include/mlir/Interfaces/SideEffectInterfaces.h mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h mlir/lib/Interfaces/SideEffectInterfaces.cpp mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp mlir/test/lib/Dialect/Test/TestOps.h
View the diff from clang-format here.diff --git a/mlir/include/mlir/Interfaces/SideEffectInterfaces.h b/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
index e2b3ba10e..cb705a06e 100644
--- a/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
+++ b/mlir/include/mlir/Interfaces/SideEffectInterfaces.h
@@ -383,7 +383,9 @@ getMemoryEffectsSorted(Operation *op);
/// resource. An 'allocate' effect implies only allocation of the resource, and
/// not any visible mutation or dereference.
struct Allocate : public Effect::Base<Allocate> {
- Allocate() : Effect::Base<Allocate>() { this->priority = Priority::kAllocPriority; }
+ Allocate() : Effect::Base<Allocate>() {
+ this->priority = Priority::kAllocPriority;
+ }
};
/// The following effect indicates that the operation frees some resource that
diff --git a/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h b/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
index 82581efda..ef7c72e66 100644
--- a/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
+++ b/mlir/include/mlir/Transforms/LoopInvariantCodeMotionUtils.h
@@ -25,14 +25,16 @@ class Value;
/// Gathers potential conflicts on all memory resources used within loop
///
-/// Given a target loop and an op within it (or the loop op itself),
-/// gathers op's memory effects and flags potential resource conflicts
-/// in a map and then recurses into the op's regions to gather nested
-/// resource conflicts
+/// Given a target loop and an op within it (or the loop op itself),
+/// gathers op's memory effects and flags potential resource conflicts
+/// in a map and then recurses into the op's regions to gather nested
+/// resource conflicts
///
/// First call should use loop = someLoop and op = someLoop.getOperation()
-void gatherResourceConflicts(LoopLikeOpInterface loop, Operation *op,
- DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts);
+void gatherResourceConflicts(
+ LoopLikeOpInterface loop, Operation *op,
+ DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+ &resourceConflicts);
/// Given a list of regions, perform loop-invariant code motion. An operation is
/// loop-invariant if it depends only of values defined outside of the loop.
diff --git a/mlir/lib/Interfaces/SideEffectInterfaces.cpp b/mlir/lib/Interfaces/SideEffectInterfaces.cpp
index ec8618e24..6603f9193 100644
--- a/mlir/lib/Interfaces/SideEffectInterfaces.cpp
+++ b/mlir/lib/Interfaces/SideEffectInterfaces.cpp
@@ -330,19 +330,19 @@ mlir::MemoryEffects::getMemoryEffectsSorted(Operation *op) {
memInterface.getEffects(effectsSorted);
- auto sortEffects =
- [](llvm::SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
- llvm::stable_sort(effects, [](const MemoryEffects::EffectInstance &a,
- const MemoryEffects::EffectInstance &b) {
- if (a.getStage() < b.getStage())
- return true;
-
- if (a.getStage() == b.getStage())
- return a.getEffect()->getPriority() < b.getEffect()->getPriority();
-
- return false; // b before a
- });
- };
+ auto sortEffects =
+ [](llvm::SmallVectorImpl<MemoryEffects::EffectInstance> &effects) {
+ llvm::stable_sort(effects, [](const MemoryEffects::EffectInstance &a,
+ const MemoryEffects::EffectInstance &b) {
+ if (a.getStage() < b.getStage())
+ return true;
+
+ if (a.getStage() == b.getStage())
+ return a.getEffect()->getPriority() < b.getEffect()->getPriority();
+
+ return false; // b before a
+ });
+ };
sortEffects(effectsSorted);
return effectsSorted;
@@ -352,12 +352,12 @@ bool mlir::isMemoryEffectFree(Operation *op) {
if (auto memInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
if (!memInterface.hasNoEffect())
return false;
-
+
// If the op does not have recursive side effects, then it is memory effect
// free.
if (!op->hasTrait<OpTrait::HasRecursiveMemoryEffects>())
return true;
-
+
} else if (!op->hasTrait<OpTrait::HasRecursiveMemoryEffects>()) {
// Otherwise, if the op does not implement the memory effect interface and
// it does not have recursive side effects, then it cannot be known that the
diff --git a/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp b/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
index 7b59b4abb..a2cbcc40e 100644
--- a/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
+++ b/mlir/lib/Transforms/Utils/LoopInvariantCodeMotionUtils.cpp
@@ -60,18 +60,19 @@ static bool canBeHoisted(Operation *op,
op, [&](OpOperand &operand) { return definedOutside(operand.get()); });
}
-/// Merges srcEffect's Memory Effect on its resource into the
+/// Merges srcEffect's Memory Effect on its resource into the
/// resourceConflicts map, flagging resources if the srcEffect
/// results in a conflict
-static void mergeResource(
- DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts,
- const MemoryEffects::EffectInstance &srcEffect,
- bool srcHasConflict) {
+static void
+mergeResource(DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+ &resourceConflicts,
+ const MemoryEffects::EffectInstance &srcEffect,
+ bool srcHasConflict) {
TypeID srcResourceID = srcEffect.getResource()->getResourceID();
- bool srcIsAllocOrFree = isa<MemoryEffects::Allocate>(srcEffect.getEffect())
- || isa<MemoryEffects::Free>(srcEffect.getEffect());
+ bool srcIsAllocOrFree = isa<MemoryEffects::Allocate>(srcEffect.getEffect()) ||
+ isa<MemoryEffects::Free>(srcEffect.getEffect());
bool conflict = srcHasConflict || srcIsAllocOrFree;
@@ -79,7 +80,8 @@ static void mergeResource(
// if it doesn't already exist, create entry for resource in map
if (dstIt == resourceConflicts.end()) {
- resourceConflicts.insert(std::make_pair(srcResourceID, std::make_pair(conflict, srcEffect)));
+ resourceConflicts.insert(
+ std::make_pair(srcResourceID, std::make_pair(conflict, srcEffect)));
return;
}
@@ -93,10 +95,10 @@ static void mergeResource(
bool srcWrite = isa<MemoryEffects::Write>(srcEffect.getEffect());
bool dstRead = isa<MemoryEffects::Read>(dstEffect.getEffect());
bool readBeforeWrite = dstRead && srcWrite;
-
+
conflict = conflict || readBeforeWrite;
- dstIt->second =std::make_pair(conflict, srcEffect);
+ dstIt->second = std::make_pair(conflict, srcEffect);
}
/// Returns true if any of op's OpOperands are defined outside of loopLike
@@ -113,14 +115,16 @@ static bool hasLoopVariantInput(LoopLikeOpInterface loopLike, Operation *op) {
/// flagged as having a conflict within the resourceConflicts map
/// (b) op doesn't have a MemoryEffectOpInterface or has one but
/// without any specific effects
-static bool mayHaveMemoryEffectConflict(Operation *op,
- DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts) {
+static bool mayHaveMemoryEffectConflict(
+ Operation *op,
+ DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+ &resourceConflicts) {
auto memInterface = dyn_cast<MemoryEffectOpInterface>(op);
-
+
// op does not implement the memory effect op interface
// shouldn't be flagged as movable to be conservative
- if (!memInterface)
+ if (!memInterface)
return true;
// gather all effects on op
@@ -128,7 +132,7 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
memInterface.getEffects(effects);
// op has interface but no effects, be conservative
- if (effects.empty())
+ if (effects.empty())
return true;
// RFC moving ops with HasRecursiveMemoryEffects that have nested ops
@@ -138,10 +142,10 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
for (const MemoryEffects::EffectInstance &effect : effects) {
auto resourceID = effect.getResource()->getResourceID();
- auto resConIt = resourceConflicts.find(resourceID);
+ auto resConIt = resourceConflicts.find(resourceID);
if (resConIt == resourceConflicts.end())
return true; // RFC realistically shouldn't reach here but just in case?
-
+
bool hasConflict = resConIt->second.first;
if (hasConflict)
return true;
@@ -150,13 +154,15 @@ static bool mayHaveMemoryEffectConflict(Operation *op,
return false;
}
-void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
- DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> &resourceConflicts) {
+void mlir::gatherResourceConflicts(
+ LoopLikeOpInterface loopLike, Operation *op,
+ DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+ &resourceConflicts) {
if (auto memInterface = dyn_cast<MemoryEffectOpInterface>(op)) {
// gather all effects on op
llvm::SmallVector<MemoryEffects::EffectInstance> effects =
- MemoryEffects::getMemoryEffectsSorted(op);
+ MemoryEffects::getMemoryEffectsSorted(op);
if (!effects.empty()) {
// any variant input to the op could be the data source
@@ -166,7 +172,7 @@ void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
for (const MemoryEffects::EffectInstance &effect : effects) {
bool conflict = false;
bool isWrite = isa<MemoryEffects::Write>(effect.getEffect());
-
+
// all writes to a resource that follow a read on any other resource
// have to be considered a conflict as guaranteeing that the read
// is invariant and won't affect the write requires more robust logic
@@ -183,8 +189,8 @@ void mlir::gatherResourceConflicts(LoopLikeOpInterface loopLike, Operation *op,
}
for (Region ®ion : op->getRegions())
- for (Operation &opInner : region.getOps())
- gatherResourceConflicts(loopLike, &opInner, resourceConflicts);
+ for (Operation &opInner : region.getOps())
+ gatherResourceConflicts(loopLike, &opInner, resourceConflicts);
}
size_t mlir::moveLoopInvariantCode(
@@ -205,8 +211,10 @@ size_t mlir::moveLoopInvariantCode(
// continuous region --> need to add fork checking
//
// loop "do" and "then" regions also merged
- DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>> resourceConflicts;
- mlir::gatherResourceConflicts(loopLike, loopLike.getOperation(), resourceConflicts);
+ DenseMap<TypeID, std::pair<bool, MemoryEffects::EffectInstance>>
+ resourceConflicts;
+ mlir::gatherResourceConflicts(loopLike, loopLike.getOperation(),
+ resourceConflicts);
auto regions = loopLike.getLoopRegions();
for (Region *region : regions) {
@@ -231,12 +239,12 @@ size_t mlir::moveLoopInvariantCode(
LDBG() << "Checking op: "
<< OpWithFlags(op, OpPrintingFlags().skipRegions());
- bool noMemoryConflicts = isMemoryEffectFree(op)
- || !mayHaveMemoryEffectConflict(op, resourceConflicts);
+ bool noMemoryConflicts =
+ isMemoryEffectFree(op) ||
+ !mayHaveMemoryEffectConflict(op, resourceConflicts);
- if (!noMemoryConflicts
- || !shouldMoveOutOfRegion(op, region)
- || !canBeHoisted(op, definedOutside))
+ if (!noMemoryConflicts || !shouldMoveOutOfRegion(op, region) ||
+ !canBeHoisted(op, definedOutside))
continue;
LDBG() << "Moving loop-invariant op: " << *op;
@@ -260,9 +268,7 @@ size_t mlir::moveLoopInvariantCode(LoopLikeOpInterface loopLike) {
[&](Value value, Region *) {
return loopLike.isDefinedOutsideOfLoop(value);
},
- [&](Operation *op, Region *) {
- return isSpeculatable(op);
- },
+ [&](Operation *op, Region *) { return isSpeculatable(op); },
[&](Operation *op, Region *) { loopLike.moveOutOfLoop(op); });
}
|
if (effects.empty()) | ||
return true; | ||
|
||
// RFC moving ops with HasRecursiveMemoryEffects that have nested ops |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC? Is this a TODO?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kind of, yes. HasRecursiveMemoryEffects still works as it previously did through the isMemoryEffectFree() path but I wanted to discuss how to add support through this part of the path before implementing it.
The simplest approach for an op with this trait is to recurse down the op's regions and if all nested ops have the MemEffInterface and are conflict free, and the op is speculatable, then the op is movable. The part I'm fuzzy on is whether all nested ops should be required to be speculatable too or if that doesn't matter
func.func @move_single_resource_write_dominant() attributes {} { | ||
%c0_i32 = arith.constant 0 : i32 | ||
%c1_i32 = arith.constant 10 : i32 | ||
%c2_i32 = arith.constant 1 : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name the variable according to the constant: reading the loops it is misleading right now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made my constant naming consistent with the stored values. On that note, I was looking for a way to arbitrarily tag ops, in this case the loops, in IR to use with the checker but couldn't find anything. I ended up using different constants for the loops' upper bounds so we can check exactly which loop was moved where instead. That should make the tests easier to understand.
%c0_i32 = arith.constant 0 : i32 | ||
%c1_i32 = arith.constant 10 : i32 | ||
%c2_i32 = arith.constant 1 : i32 | ||
%c0_i32_0 = arith.constant 0 : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have a duplicated 0 here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// CHECK: "test.test_effects_write_A"() : () -> () | ||
|
||
"test.test_effects_read_A"() : () -> () | ||
"test.test_effects_write_A"() : () -> () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is W->R movable but not R->W?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R->W isn't movable since the value of the read will be different between iterations 1 and 2.
For the flipped W->R of this case, there are no inputs to write_A and it has no read effects so we can infer that the write data has to be a constant within the op.
The read_A op has no operands or results or write effects so I can see an argument here that the read value doesn't get used by anything and the op can be moved. Is that an assumption that we can safely make in these cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
R->W isn't movable since the value of the read will be different between iterations 1 and 2.
Sure, but the sequence R->W is indempotent, isn't it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's safe to assume that, because test_effects_read_A() doesn't take any input and doesn't have a result, the read data "isn't used," It would be idempotent.
If that can't be assumed, then the conservative option is to flag it as a conflict on the resource.
If the assumption CAN be made, I can check if the read data "is used" when mapping conflicts.
// CHECK: "test.test_effects_write_EF"() : () -> () | ||
// CHECK: "test.test_effects_read_EF"() : () -> () | ||
|
||
// Both of these should be moved out of their parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both of these what? Loops? Where is it checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in previous comments
%1 = arith.cmpi slt, %arg0, %c3_i32 : i32 | ||
scf.if %1 { | ||
%c0_i32_0 = arith.constant 0 : i32 | ||
%c0_i32_1 = arith.constant 0 : i32 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove redundant constant please, we already have 0 available in this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
// CHECK: "test.test_effects_read_B"() : () -> () | ||
// CHECK: scf.for | ||
// CHECK: scf.for | ||
// CHECK: scf.for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this is checking isn't clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in previous comments
// CHECK: "test.test_effects_write_A"() : () -> () | ||
// CHECK: "test.test_effects_read_A"() : () -> () | ||
|
||
// Loop should be moved out of parent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is it checked?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched to using different constants for the loops' upper bounds so we can check exactly which loop was moved where
} | ||
else { | ||
// CHECK: "test.test_effects_write_F"() : () -> () | ||
// CHECK: "test.test_effects_read_F"() : () -> () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the intent for these checkes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this particular check was to make sure that these ops aren't moved out of a non-loop region, and that their presence here will cause conflicts on resources that make ops in the regions above it unmovable
|
||
%input = arith.constant 7 : index | ||
"test.test_effects_write_A_with_input"(%input) : (index) -> () | ||
"test.test_effects_read_A"() : () -> () |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't %input a loop invariant, and so why aren't all these moved out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass checks if the operands are defined inside/outside the loop. The arith.constant is moved out by LICM and a second LICM pass will also move the test ops. More complex logic for detecting if the op that defines the input in the loop is invariant or not could be added at some point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The pass checks if the operands are defined inside/outside the loop
LICM is a worklist driven algorithm, and when an operation is moved out, its users are reprocessed, so what's up here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue is that the conflicts are mapped out at loop level before we start moving things. After something is moved, the conflict map isn't recreated and a stale one is used for the rest of the operations. I added a fix by wrapping the main part of moveLoopInvariantCode() within a while loop where it'll repeatedly analyze the loop until no more moves are made. Let me know what your thoughts are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That may be reasonable considering the implementation is very iterative. A more efficient one may compute iteratively what can be moved and keep the state updated, but that's out-of-reach for now.
I am more curious about where this is all leading right now in practice, considering that the algorithm does not check for loop being executed at least once, and so is restricted to speculatable operations, which is not compatible with "memory write" in general.
We're gonna pay some compile-time, we should know it'll serve actual use-cases. I can't picture one where we use scf.for and memref load/store right now for example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping here? That's probably one of my main questions at the moment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with both your points regarding dead loop checking and being restricted to speculatable operations. I misunderstood why the movable ops had to be both effect free and speculatable in the old LICM.
I think we need to split up the condition so there's 2 separate checks:
- shouldMoveOutSpeculatable --> isMemoryEffectFree && isSpeculatable
- shouldMoveOutMemoryEffects --> !mayHaveMemoryEffectConflict && ( !hasSpeculatableOpInterface || isSpeculatable)
if either is true, the op can be moved.
I rebased so I can use getStaticTripCount() and I'm in the process of adding dead loop checking along with the above changes. Let me know if this is on the right track.
Regarding use-case, I still have to clean-up the test-cases so I'll add one that's as close to my own use-case as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the latest changes addressed all your comments. looking forward to any/all feedback
return true; | ||
} | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I coincidentally worked on trip count this weekend: #158679
This is super tricky to get right when taking overflow into account (your logic is likely not correct for all possible cases here: just think that you're not accounting for unsigned flagged scf.for for example).
So I suggest rebasing on my PR after it lands, and use getStaticTripCount() == 0
:)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fantastic! Will rebase when it's available. Accounting for strange cases was one of my main worries after you pointed out the unsigned flag issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It landed FYI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I used it in addition to isZeroTrip() to confirm liveness
} | ||
} | ||
|
||
numMovedTotal += numMoved; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numMovedTotal += numMoved; | |
numMovedTotal += numMoved; | |
LDBG() << "Finishing LICM iteration " << iteration++ << " moved " << numMoved << " ops, total is now " << numMovedTotal; |
(need to add int iteration = 0;
before the loop as well)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
great suggestion: added
[&](Operation *op, Region *) { | ||
return isMemoryEffectFree(op) && isSpeculatable(op); | ||
}, | ||
[&](Operation *op, Region *) { return isSpeculatable(op); }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit weird as it looks like a legality criteria but the callback is "shouldMove..." which looks like a profitability criteria instead.
I would think that the two are orthogonal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the current check in the loop is rely on something being always speculatable, while I would think we should be able to improve it to move non-speculatable operations out of the loop when we know it'll execute at least once?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial understanding based on how the pass was written was that the op needs to be both speculatable and free of memory effects/conflicts... But, as I type this, it just clicked that the old LICM was centred entirely around conditional speculability and the mem effect check was a sub-condition for that 🤦
So there should be 2 pathways to moving an op out. if either is true, op is movable:
- isMemoryEffectFree(op) && isSpeculatable(op)
- !mayHaveMemoryEffectConflict(op, ...) && opDoesNotHaveConditionalSpeculabilityInterface
The reason for opDoesNotHaveConditionalSpeculabilityInterface is that if the interface is there, if the op speculatable, based on its definition, it shouldn't have side-effects and, if it's NotSpeculatable, then it may have undefined behaviour and shouldn't be moved.
Is that sufficient or do we need to check for anything else?
def TestEffectsReadAWriteB : TEST_Op<"test_effects_read_A_write_B", | ||
[MemoryEffects<[MemRead<TestResourceA>, | ||
MemWrite<TestResourceB>]>, | ||
AlwaysSpeculatable]>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm slightly confused by the notion of a write that can be speculated, can you clarify why this is OK?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
major misunderstanding on my part from how the pass was originally structured when I first started tinkering with it.
Speculatable trait has been removed from all of the new test ops
Revised how pass works, can now move code under 2 separate conditions:
Other changes:
|
|
||
auto condSpecInterface = dyn_cast<ConditionallySpeculatable>(op); | ||
|
||
// if op implements ConditionallySpeculatable interface, must be speculatable! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC: I put this in as a pre-caution
A more conservative approach which would make more sense (since speculatable ops aren't meant to have side-effects) is to simply return true if the interface is present
// free. | ||
// A potential solution is to recursively gather all resources on all | ||
// contained ops and then run the for-loop further below. Requires discussions | ||
// re: obscure corner cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RFC: when an op reaches this part of the method, would it be correct to add an if statement on line 162 where it will select getEffects()/getEffectsRecursively() based on if it HasRecursiveMemoryEffects and move the op and it's nested regions?
@joker-eph Sorry for the long delay. I finally got around to making the requested changes and cleaning up the code. I think splitting up the movability checks into 2, 1 for speculability interface and 1 for memory effects for Live Loops, addresses the main concern from last time. I hope the fixed test-cases provide are more clear now and provide more value in terms of showing users how the pass is meant to work |
LICM pass has been made more aggressive when evaluating what ops it can move out of loop regions.
Ops with MemWrite memory effects will now be moved out of loops if: